50 E2E Test Scenarios - Bug Analysis and Fixes
**Date:** 2026-02-09
**Test Script:** scripts/test_50_scenarios.py
**Results:** 72% pass rate (36/50 tests)
---
Test Results Summary
| Category | Pass Rate | Status |
|---|---|---|
| Auth | 5/5 (100.0%) | ✅ Perfect |
| Agent | 5/7 (71.4%) | ⚠️ Issues |
| Execution | 6/6 (100.0%) | ✅ Perfect |
| Graduation | 5/7 (71.4%) | ⚠️ Issues |
| Episodes | 4/5 (80.0%) | ⚠️ Minor issues |
| Admin | 2/5 (40.0%) | ❌ Major issues |
| Errors | 5/5 (100.0%) | ✅ Perfect |
| Edge Cases | 0/5 (0.0%) | ❌ All failed |
| Performance | 3/3 (100.0%) | ✅ Perfect |
| Integration | 1/2 (50.0%) | ⚠️ Issues |
---
Root Cause Analysis
1. HTTP 429 Errors - Quota Enforcement (Not Rate Limiting)
**Tests Affected:**
[Agent] Enforce quota limits- "Quota enforced at agent 7"[Agent] Create agent with all capabilities- HTTP 429[Edge Cases] Handle unicode/special chars- HTTP 429[Edge Cases] Handle long agent names- HTTP 429[Edge Cases] Handle invalid maturity level- HTTP 429
**Root Cause:**
The HTTP 429 errors are from QuotaManager.check_agent_quota() which raises HTTP 429 when the agent limit is reached, NOT from rate limiting.
# backend-saas/core/quota_manager.py:99
raise HTTPException(
status_code=429,
detail=f"Agent limit reached ({agent_count}/{limit}). Please upgrade your plan."
)**Issue:**
- Solo plan allows 10 agents
- Test creates 6 agents in basic agent tests
- Previous test runs may have left agents in database
- No cleanup between runs causes accumulation
**Quota Limits:**
free: 3 agents
solo: 10 agents (QuotaManager.QUOTAS["solo"]["max_agents"])
team: 25 agents
enterprise: 1000 agents**Fix Required:**
- Add test cleanup to delete agents after each test run
- Or use unique tenant subdomain for each test run
- Or increase solo plan quota for testing
---
2. Admin Creation Response Missing Role
**Test Affected:**
[Admin] Create workspace admin- Role: N/A
**Root Cause:**
The create-admin endpoint returns TestAuthResponse (without role field) instead of AdminAuthResponse.
# backend-saas/api/routes/test_auth_routes.py:271
return TestAuthResponse( # ❌ Missing role field
user_id=str(user.id),
tenant_id=str(tenant.id),
test_token=test_token,
email=user.email,
name=user.name # ❌ Should be user.first_name or user.email
)**Fix Applied:**
Changed return to AdminAuthResponse with role field:
return AdminAuthResponse(
user_id=str(user.id),
tenant_id=str(tenant.id),
test_token=test_token,
token_type="test",
email=user.email,
name=user.first_name or user.email,
role=user.role # ✅ Now includes role
)**Status:** ✅ FIXED
---
3. Promotion/Demotion HTTP 500 Errors
**Tests Affected:**
[Graduation] Promote agent with auth- HTTP 500[Graduation] Demote agent with auth- HTTP 500[Admin] Promote with JWT auth- HTTP 500[Admin] Demote with JWT auth- HTTP 500
**Root Cause:**
The test tries to promote an agent from one tenant using an admin user from a different tenant.
# Test setup creates:
self.tenant_id = "team-plan-tenant" # From setup_tenant()
self.admin_tenant_id = "admin-tenant" # From setup_admin()
self.agent_id = "agent-in-team-tenant"
# Promotion test tries to:
POST /api/graduation/agents/{agent_id}/promote
Headers: {
"Authorization": f"Bearer {self.admin_token}", # Admin from admin-tenant
"X-Tenant-ID": self.admin_tenant_id, # Different tenant!
"X-User-ID": self.admin_user_id
}**Backend Logic:**
# backend-saas/api/routes/graduation_routes.py:369
tenant_id = await extract_tenant_id(request) # Gets admin_tenant_id
user_id = await extract_user_id(request)
# Tries to find agent in admin_tenant_id, but agent is in team-tenant
# Returns 500 error when agent not found**Fix Required:**
- Create admin user in the same tenant as the agent
- Or create a test agent in the admin tenant for promotion tests
- Update test to use same tenant for both agent and admin
---
4. Episode Feedback Test Flow Issue
**Test Affected:**
[Episodes] Submit episode feedback- "Failed to create episode"
**Root Cause:**
The test tries to manually create an episode before submitting feedback, but the episode creation endpoint requires a valid execution_id.
# Test flow:
1. Create episode with POST /api/test/episodes/create # ❌ This endpoint doesn't exist
2. Submit feedback to created episode # Never reaches here**Correct Flow:**
Episodes are automatically created during agent execution. The test should:
- Execute an agent skill (creates episode)
- Get the episode_id from execution response
- Submit feedback for that episode
**Fix Required:**
Update test to use real agent execution flow:
# 1. Execute agent (creates episode)
exec_response = requests.post(
f"{BASE_URL}/api/test/agents/{agent_id}/execute",
json={"skill_name": "read", "params": {"query": "test"}},
headers={...}
)
execution_id = exec_response.json()["execution_id"]
# 2. Get episode from execution
episode_response = requests.get(
f"{BASE_URL}/api/graduation/agents/{agent_id}/episodes?limit=1",
headers={...}
)
episode_id = episode_response.json()["episodes"][0]["id"]
# 3. Submit feedback
feedback_response = requests.post(
f"{BASE_URL}/api/graduation/episodes/{episode_id}/feedback",
json={"feedback_score": 0.8, "feedback_notes": "Great work!"},
headers={...}
)---
5. Edge Case Validation Errors
**Tests Affected:**
[Edge Cases] Handle zero episode count- HTTP 422[Edge Cases] Handle concurrent creation- 0/3 successful
**Root Cause for Zero Episode Count:**
The readiness endpoint validates episode_count parameter:
# backend-saas/api/routes/graduation_routes.py:45
class ExamRequest(BaseModel):
episode_count: int = Field(default=30, ge=10, le=100) # ❌ Requires >= 10**Fix Required:**
Test should use episode_count=10 (minimum valid value) instead of 0.
**Root Cause for Concurrent Creation:**
Concurrent agent creation requests may hit quota enforcement simultaneously before quota is updated.
**Fix Required:**
- Add delays between concurrent requests
- Or use sequential creation for reliability
- Or handle 429 responses and retry after delay
---
Rate Limiting vs Quota Enforcement
Common Misconception
**HTTP 429** errors in this test are NOT from rate limiting:
| Feature | Rate Limiting | Quota Enforcement |
|---|---|---|
| Source | AbuseProtectionService.checkRateLimit() | QuotaManager.check_agent_quota() |
| Storage | Redis (sliding window) | PostgreSQL (persistent count) |
| Bypass | X-Test-Secret header | No bypass (hard limit) |
| Error Code | 429 (from middleware) | 429 (from quota check) |
| Limit | Requests per minute (60-6000) | Total agents (3-1000) |
Verification
The test endpoints are exempt from rate limiting:
# backend-saas/core/security/__init__.py:30
if any(path.startswith(prefix) for prefix in self.exempted_prefixes) or test_secret:
return await call_next(request) # Bypass rate limiting
self.exempted_prefixes = [
"/api/test", # ✅ Test endpoints exempt
...
]But quota enforcement still applies:
# backend-saas/api/routes/test_auth_routes.py:381
QuotaManager.check_agent_quota(tenant_id, db) # ❌ No bypass---
Recommended Fixes
Priority 1: Test Agent Accumulation
- Add cleanup function to delete all test agents after test run
- Use unique tenant subdomain per run (e.g.,
test-{timestamp}) - Add agent count logging to debug quota issues
Priority 2: Promotion Test Cross-Tenant Issue
- Update test to create admin user in same tenant as agent
- Or create test agent in admin tenant for promotion tests
- Add tenant_id validation in promotion tests
Priority 3: Episode Feedback Test
- Update test to use real agent execution flow
- Get episode_id from execution response
- Submit feedback for real episode
Priority 4: Edge Case Tests
- Fix zero episode count to use minimum valid value (10)
- Add delays between concurrent requests
- Handle 429 responses with retry logic
---
Files Modified
Backend
backend-saas/api/routes/test_auth_routes.py- Fixed admin creation response to include role
Test Script (Pending)
scripts/test_50_scenarios.py- Needs updates for:- Admin/agent tenant alignment
- Episode feedback flow
- Edge case validation
- Concurrent request handling
---
Next Steps
- ✅ **Fixed admin creation response** - Deploy with next deployment
- ⏳ **Update test script** - Fix promotion/feedback/edge case tests
- ⏳ **Add test cleanup** - Delete agents after each run
- ⏳ **Re-run tests** - Verify all fixes work correctly
- ⏳ **Document test patterns** - Create test development guidelines
---
Test Execution Command
python3 scripts/test_50_scenarios.py**Expected Results After Fixes:**
- Pass rate: 95%+ (up from 72%)
- All admin tests: Working
- All edge case tests: Working
- Episode feedback: Working